A Combinatorial Algorithm to Compute Regularization 

Paths 



Bernd Gartner 
ETH Zurich, Switzerland 
gaertner@inf.ethz.ch 



Martin Jaggi 
ETH Zurich, Switzerland 
jaggi@inf.ethz.ch 



Joachim Giesen 
Friedrich-Schiller-Universitat 

Jena, Germany 
giesen@informatik.uni- 
jena.de 

Torsten Welsch 

Friedrich-Schiller-Universitat 
Jena, Germany 



ABSTRACT 

For a wide variety of regularization methods, algorithms 
computing the entire solution path have been developed re- 
cently. Solution path algorithms do not only compute the 
solution for one particular value of the regularization param- 
eter but the entire path of solutions, making the selection 
of an optimal parameter much easier. Most of the currently 
used algorithms are not robust in the sense that they can- 
not deal with general or degenerate input. Here we present 
a new robust, generic method for parametric quadratic pro- 
gramming. Our algorithm directly applies to nearly all ma- 
chine learning applications, where so far every application 
required its own different algorithm. 

We illustrate the usefulness of our method by applying it to 
a very low rank problem which could not be solved by exist- 
ing path tracking methods, namely to compute part-worth 
values in choice based conjoint analysis, a popular technique 
from market research to estimate consumers preferences on 
a class of parameterized options. 
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1. INTRODUCTION 

We study a combinatorial algorithm to solve parameter- 
ized quadratic programs, i.e., to compute the whole solution 
path. Unlike other methods employed in machine learn- 
ing, our algorithm can deal with singular objective func- 
tion matrices, without perturbing the input. Regularization 
methods resulting in parametrized quadratic programs have 
successfully been applied in many optimization, classifica- 
tion and regression tasks in a variety of areas as for exam- 
ple signal processing, statistics, biology, surface reconstruc- 
tion and information retrieval. We will briefly review some 
applications here, and we will also study another applica- 
tion, namely choice based conjoint analysis in more detail. 
Conjoint analysis comprises a popular family of techniques 
mostly used in market research to assess consumers' prefer- 
ences on a set of options that are specified by multiple pa- 
rameters, see [T9l for an overview and recent developments. 



We will show that a regularization approach to the analysis 
of preference data leads to a parameterized quadratic pro- 
gram with a sparse, low rank positive semi-definite matrix 
describing the quadratic term of the objective function. 

1.1 Contributions and Related Work 

Solution Path Algorithms in Machine Learning. An al- 
gorithm to compute the entire regularization path of the 
C-SVM has originally been reported by Hastie et al. [20|. 
[10| gave such an algorithm for the LASSO, and later |26] 
and 25 proposed solution path algorithms for ^-SVM and 



one-class SVM respectively. Also Receiver Operating Char- 
acteristic (ROC) curves of SVM were solved by such meth- 
ods [3]. Support vector regression (SVR) is interesting as 
its underlying quadratic program depends on two parame- 
ters, a regularization parameter (for which the solution path 
was tracked by 18, 36, 26 ) and a tube- width parameter (for 
which |35| recently gave a solution path algorithm). See also 
[30| for a recent overview. 

As Hastie et al. [20] point out, one drawback of their algo- 
rithm for the two-class SVM is that it does not work for 
singular kernel matrices, but requires that in the process of 
the algorithm, all occurring principal minors of the kernel 
matrix need to be invertible. The same is required by the 
other existing path algorithms mentioned above. However, 
large kernel matrices do often have very low numerical rank, 
even in those cases where radial base function kernels are 
used |20| Section 5.1], but of course also in the case of lin- 
ear SVMs with sparse features, such as in the application 
to conjoint analysis discussed in this paper. The inability 
to deal with singular sub-matrices is probably one of the 
main reasons that none of the above mentioned algorithms 
could so far effectively be applied on medium/larger scale 
problems |20| |30| . |30| Section 4.2] report that their algo- 
rithm prematurely terminates on 3 x 3 matrices due to this 
described problem. 

By observing that all the above mentioned algorithms are 
reporting the solution paths of parametric quadratic pro- 
gramming of the form we point out that it is in fact 
not necessary to use different algorithms for each problem 
variant. Generic algorithms have been known for quite some 



time [27J [29] , [4j|5j[17], [33], but have interestingly not yet 
received broader attention in the area of machine learning. 

One goal of this paper is to popularize the generic solution 
algorithms for parametric quadratic programming, because 
we think that they have some major advantages: 



The same algorithm can be applied to any solution 
path problem that can be written in the form ((T|, 
which includes all of [20] [lO] [l8j [36] [3] [26] [25] [ll] 
35 . 



• Many of the known generic algorithms can deal with 
all inputs; in particular the algorithms can cope with 
singular sub-matrices in the objective function. 

• There is significant existing literature on the perfor- 
mance, numerical stability, and complexity of the generic 
algorithms. 

• Our criss-cross algorithm is numerically more stable, 
and also more robust in the sense that small errors do 
not add up while tracking the solution path. Also, such 
algorithms are faster for sparse problems as in linear 
SVMs and conjoint analysis, because they do not need 
any matrix inversions. 

Comparison with other ways to deal with degeneracies. 
Instead of using our described generic criss-cross method, 
another obvious way to avoid degeneracies caused by singu- 
lar sub-matrices in the objective function is to add a small 
value e to each diagonal entry of the original matrix Q; sub- 
sequently, all simple methods for the regular case such as 
[20] [30 can be used. There are several problems with this 
approach. First of all, the rank of the objective function 
matrix is blown up artificially, and the potential of using ef- 
ficient small-rank-QP methods would be wasted. Secondly, 
the solution path of the perturbed problem may differ sub- 
stantially from that of the original problem; in particular, 
the perturbation may lead to a much higher number of bends 
and therefore higher tracking cost, and the computed solu- 
tions could be far off the real solutions. In contrast, our criss 
cross method avoids all these issues, since it always solves 
the original unperturbed problem. 

2. PARAMETRIC QUADRATIC PROGRAM- 
MING 

A quadratic program (QP) is the problem of minimizing 
a convex quadratic function subject to linear equality and 
inequality constraints. Here, we are interested in parame- 
terized quadratic programs (pQPs) of the (standard) form 



qp(a») 



minimize^ 
subject to 



x T Qx + c(/i) 
Ax > b(n) 
x>0, 



(1) 



where c : R -> R n and b : R -> M m are functions that 
describe how the linear term of the objective function and 
the right-hand side of the constraints vary with some real 
parameter /x. Q is an n x n symmetric positive semidefinite 
(PSD) matrix (the quadratic quadratic term of the objective 
function), c is an n- vector (the linear term of the objective 
function), A is an in x n matrix (the constraint matrix), and 
b is an m- vector (the right-hand side of the constraints). 



Our goal is to solve a given problem QP(/i) of the form (JTJ 
for all /i in a given interval [^ m i„,/i max ], where we assume 
for now that QP(/i) has an optimal solution for all [i in 
that interval (the general case is easy to handle as well, see 
the remark in the "Odds and Ends" paragraph below). In 
other words, given any value of u G [/x m i n , (JmaJ, we want to 
retrieve an optimal solution x* to QP(^t) quickly, without 
having to solve the problem from scratch. The task of solv- 
ing such a problem for all possible values of the parameter fi 
is called parametric quadratic programming. What we want 
as output is a solution path, an explicit function x* : R — > IR n 
that describes the solution as a function of the parameter fi. 

It is well known that the solution path x* is piecewise linear 
if c and b are linear functions of /x, see for example [28]. 



2.1 Regularization Methods and pQPs 

A variety of machine learning methods, in particular many 
regularization methods, are direct instances of parametric 
quadratic programming. Examples include support vector 
machines [6], support vector regression [32], the LASSO 
surface reconstruction [31] , l i-regularized least squares 
and compressed sensing [13] . 
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Let us shortly describe the support vector machine as a 
popular example of a pQP that results from regularization. 
Later we will re-discover the corresponding pQP in the con- 
text of choice based conjoint analysis. 



Support Vector Machine. The support vector machine (SVM) 
is a standard tool for two-class classification problems. In 
Section [4] we will see that estimating part-worth values in 
choice based conjoint analysis can be seen as a problem that 
is geometrically dual to binary classification. The primal 
soft margin C-SVM is the following pQP: 



minimize™,^ 
subject to 



i/i(w Xi + b) > 1 — & 



where y; £ {±1} is the class label of data point Xi and C is 
the regularization parameter. The dual of the soft margin 
C-SVM is the following pQP (observe that the regulariza- 
tion parameter moves from the objective function to the 
constraints): 



maximize^ J2i a i -^Ei 
subject to Vi a i = 1 

< ou < C 



(3) 



3. THE CRISS-CROSS METHOD FOR PQPS 

Next we will present a new generic algorithm that uses LCP 
techniques; in contrast to Murty's method [27], it uses the 
extremely simple and elegant cnss-cross method as a sub- 
routine, resulting in what we believe is the simplest generic 
algorithm that is able to deal with arbitrary PSD matrices 



The algorithm works in principle for more general continu- 
ous functions c. The main idea is to transform |l] to a para- 
metric linear complementarity problem (LCP), and then use 
the criss-cross method to quickly update the solution while 
u. varies. 



3.1 The LCP Formulation 

Let us recall the Karush-Kuhn- Tucker optimality conditions 
for quadratic programs, see e.g. [8j Section 2.8]. 

Theorem 1. An n-vector x is an optimal solution to £Ip 
if and only if there exists and n-vector u as well as m-vectors 
y and v such that 



(i) v = Ax - b(n) > 

(ii) u = cf» - A T y + 2Qx > 
(Hi) x T u — 



and 
and 
and 



x > 

y>o 

y T v = 0, 



where (i) encodes primal feasibility of x, (ii) encodes dual 
feasibility of y, and (iii) is referred to as complementary 
slackness. 

The three conditions of the previous theorem can be rewrit- 
ten in the form 

w — Mz — q(n) 
w.z > 



(4) 



w 1 z = 0, 



where w T = (u T , v T ), z T = (x T , y T ), q(n) T = (c(») T , -%) T ) 
andM=( 2 J ~f 

Problem Q is a linear complementarity problem (LCP) with 
a PSD matrix (for all w, we have w T Mw = 2u T Qu > — 
this is why we have chosen the constraints to be "Ax > 6" 
instead of the more common " 'Ax < 6"; the latter would 
lead to a symmetric but not necessarily positive semidefinite 
matrix M in the LCP Q). In order so solve Q, we will 
therefore find w* and z that satisfy Q; then the first n 
components of the (n + m)-vector z* form a solution to (QLp 
This reduction of QP to LCP is well-known, see e.g. 
Section 1.2]. 

3.2 The Criss-Cross Method 

The criss-cross method is a combinatorial method for find- 
ing vectors w and z that satisfy H, given that q = q(/j,) is 
fixed (we address the case of varying a below). The method 
is guaranteed to terminate (with a solution, or a proof of 
infeasibility of Q), given that M is a sufficient matrix, see 
e.g. 



14 



This matrix class contains all PSD matrices, mean- 
ing that the criss-cross method is applicable in our setting. 
Our description below is for the special case of PSD matrices 

The criss-cross method is an iterative method that goes 
through a sequence of basic solutions. To define such a so- 
lution, we consider any subset B C [k], k := n + m and the 
matrix Mb whose j-ih column is the j-th column Ij of the 
k x k identity matrix / (if j £ B), or the j'-th column of 
— M (if j ^ B). B is called a basis if Mb is invertible. For 
example, B — [k] is a basis since Afrju = /. 

Given a basis, we obtain the corresponding basic solution as 
the unique solution of the following system of equations: 



*i = o, 
Wj = 0, 
Mz = q. 



j G B, 
3#B, 



This indeed has a unique solution, since substitution of the 
first two sets of equations into w — Mz = q yields the system 
Mb^b ~ q, where Xj — Wj if j 6 B and Xj = Zj otherwise. 

It is clear that every basic solution (w, z) satisfies w T z — 0, 
but w, z > may not hold. The criss-cross method tries 
to rectify this by repeatedly moving to another basis and 
corresponding basic solution, until w, z > in which case 
the LCP is solved. 

Given a basis B along with X* B (the unique solution of Mb X b - 
q), one step of the method works as follows. If A* := > 0, 
we are done; otherwise, choose the smallest index r such that 
A* < 0. With respect to B, the system w — Mz = q can 
be written as MbXb + MjvAjv = q, where N — [k] \ B. 
Consequently, 

As = M^q - M^MnXn 



for all solutions of w — Mz — 
with B is obtained from Ajv : 



(the basic solution associated 
= 0). 



Let the k x k matrix A = — Af B 1 Mjv be the dictionary as- 
sociated with B, so that we have 

Ab = Mg 1 q + AX N . (5) 

There are now two cases: 

(a) A rj < for all j £ [k]. By jS) we have 

(As) r = A* + A r Ajv 

for all solutions of w — Mz = q, where A r is the r- 
th row of A. But since this yields A r < whenever 
Ajv > 0, there can't be any solution to w — Mz — q 
with w, z > 0, and we can conclude that the LCP is 
infeasible. 

(b) A r j > for some j £ [k]. Choose the smallest index s 
such that A rs > and set p := max(r, s). If A pp 7^ 0, 
update B to B' := B(B{p} (diagonal pivot) , otherwise 
update B to B' := B © {r, s} (exchange pivot), where 
© denotes symmetric set difference. 



Lemma 2. The set B' resulting from step (b) is again a 
basis. 



PROOF. In general, if B' = B © D, then M B > is obtained 
from M B by replacing the columns whose indices are in D 
with the corresponding columns of Mjv. This update can be 
written as 



Mr. 



M B T, 



where Tj = Ij if j £ D and T, = (M~ 1 M N ) J = -Aj for 
j G D. Moreover, since Mb was invertible, M B > is invertible 
if and only if det(T) 7^ 0. If D = {p} (the diagonal pivot), 
we get det(T) = — A pp 7^ 0. If D = {r,s} (the exchange 
pivot), we assume w.l.o.g. r < s — p and get 



det(T) = det 



A„ 



In order to evaluate this, we need one observation concern- 
ing the structure of A = —Mg 1 Mjv . Let us call an n x 



n matrix bisymmetnc if it is of the form I ~| p 

where both Q and P are symmetric. For example, M — 
—M^M® is bisymmetric, but simple calculations show that 

A = -Mg 1 Mn is also bisymmetric, hence A ar = — A rs < 
which implies det(T) > 0. □ 



This method is due Klafszky and Terlaky [24] who also show 
that it terminates after having gone through a finite number 
of bases. 

3.3 Varying the Parameter 

We now turn to the case where the right-hand side q(fj.) 
of fl} varies. Assume that we have solved the problem for 
M = Mmin using the criss-cross method, meaning that we now 
have a basis B C [k] such that 

\* B (») = M^qfa) > 0. 

Since X%(fi) depends linearly on fi (assuming that b and c in 
|T]) are linear functions), we can easily compute the largest 
value \i > n such that X* B (jj,') > (we may have // = fi but 
also [i = oo). 

For every value k G [a'iM'Ii ^s( k ) i s still a solution to Q 
with right-hand side qin). In order to be able to trace the 
solution beyond k — // , we apply the criss-cross method to 
Q again, starting from the basis B, but now with the right- 
hand side q — q(fi' + e), where £ is a symbolic parameter 
meant to represent an arbitrarily small positive value. That 
way, we solve a slightly perturbed LCP, starting from a so- 
lution to the old LCP, and in practice, we expect that this 
will take only very few iterations. There are no theoretical 
guarantees for this, though^ 

In running the criss-cross-method on the symbolically per- 
turbed problem, all values A* whose signs are being used to 
check whether we currently have a solution to Q are lin- 
ear polynomials in e (dictionary entries that are needed to 
check for infeasibility are unaffected by e). The sign of a 
linear polynomial a + er is determined by a if a 7^ 0, and 
by r otherwise. 

It follows that for the basis B' obtained upon termination 
of the criss-cross method, there are fc-vectors s and t such 
that 



^B'iv + e) = s + Et, 
where Sj > or Sj = 0, tj > Vj G [k]. 



(6) 



This implies that X* B , (fi' + e) > for any sufficiently small 
numerical value of e. In other words, B' is valid throughout 
a whole interval [// , // + e'] , where e' > is easy to compute 
from 

While increasing fj,, we therefore subdivide our interval [/i m in, A'max 
into pieces over which the solution to Q and therefore also 
the solution to Q is linear in fi. There are only finitely many 
such pieces, since no basis B can repeat (if B is valid for two 
values y! , it is also valid for any intermediate value). 



Performance. By the above analysis, we have that our al- 
gorithm calculates the entire solution path of any paramet- 
ric quadratic program in finite time. Also, it is well suited 
to make use of the sparseness of the solutions, which is a 
key property of all regularization methods. When running 
the algorithm, the relevant size of the matrices Mb that we 
have to deal with is bounded by the number the number of 
non-zero entries in x, plus m. 



Odds and Ends. The solution path computed in the above 
way may be discontinuous, since the solution to the LCP 
may "jump" when we move from q(n') to q(fi' + e). This is 
due to the fact that the LCP has in general not a unique so- 
lution, and the criss-cross method has no control over which 
optimal solution it finds. However if one strictly wants con- 
tinuity, one can simply insert connecting straight-line seg- 
ments: Since both endpoints are solutions for q(fi') (set 
e = 0), all intermediate points will be solutions as well. This 
holds for the :r-part of (w, z) (the QP solution) by convexity 
of the optimal region in |TJ, but it also holds for (w, z) w.r.t. 
the LCP by a result of Adler and Gale [2] . 

For the above to work, we do not even have to assume that 
QP(/x) has an optimal solution throughout [/i m i n , ^ max ]- Our 
method can handle the general case. We may start off at 
A 1 = Mmin with an unsolvable LCP (the criss-cross method 
will report this), or we may run into an unsolvable situation 
later. In order to trace /j, through such a situation, we sim- 
ply choose the "next event" as the largest //' > y, for which 
(X B )(n') r < 0, where (As) r is the variable for which infea- 
sibility was detected in case (a) of the criss-cross method. 

4. CHOICE BASED CONJOINT ANALYSIS 

In general conjoint analysis includes two tasks: (a) prefer- 
ence data assessment, and (b) analysis of the assessed data. 
In choice based conjoint analysis (CBC) preference data are 
assessed on a set of options A in a sequence of choice experi- 
ments. In every choice experiment a consumer has to choose 
the most preferred option out of a few options that are pre- 
sented to her/him (typically between two and five options 
from A). The set of all options is assumed to carry a conjoint 
structure, i.e., A is the Cartesian product A = Ai X . . .X A n 
of parameter sets Ai. In the following we assume that the 
parameter sets Ai are finite. Choice data are of the form 
a y b, where a = (ai, . . . , a n ), b — (b\, ... , b n ) g A and a 
was preferred over b by some consumer in a choice experi- 
ment. Our goal is to compute an interval scale v : A — > E 
on the domain A from a set of choice data. The scale v 
is meant to represent the preferences of the population of 
consumers who contributed to the choice experiments, i.e., 
a G A is more popular than b G A if v(a) > v(b), and the 
difference v(a) — v(b) tells how much more popular v(a) is 
than v(b). 

In the data analysis stage of conjoint analysis it is almost 
always assumed [T] [7] that the scale v is linear, i.e., that it 
can be written as as 



1 This complexity behavior is expected to be very similar to 
running Simplex steps for a slightly perturbed linear pro- 
gram, starting from a solution for the original problem. 



v(a) = w((ai,. . . ,a„)) = ^Vj(ai) 



(7) 



where Vi : Ai 



are also interval scales, see |22| for a 



justification of choosing a iinear scale. The value Vi(ai), a,i £ 
Ai is called the part- worth value of level 04, i.e., the value 
that it contributes to the overall value of an option a where 
the level Oi is present. The goal of choice based conjoint 
analysis is to compute/estimate part- worth values for all 
attribute levels from choice data. 




Regularization approach to compute part-worth val- 
ues. Our goal here is to review how computing part-worth 
values in choice based conjoint analysis naturally leads to a 
geometrically dual formulation of a SVM, see 15 for more 
details. Assuming that the scale v is linear, then the part- 
worth values Vi(aij) € R, dij 6 Ai, i = 1, . . . , n, should sat- 
isfy constraints of the form 



^ Vi(ai) - Vi(bi) > 0, 



(8) 




Figure 1: An important property of duality that 
makes it useful for our application is that the dual- 
ity of non-vertical hyperplanes and points preserves 
relative positions. Dual points are labeled by lower- 
case letters, and dual hyperplanes by capital letters. 



whenever a = (ai, . . . , a n ) was preferred over b — (bi, . . . , b n ) 
by some consumer in a choice experiment. Let m% = \Ai\ 
and m — J2^=i m »- Any linear scale v on the domain A is 
represented by a vector (vi(qij)) ._ 1 „. _j m . £ R"\ and 
a choice experiment is defined by the characteristic vectors 
Xa £ {0, l} m , whose i'th component is 1 if the correspond- 
ing parameter level is present in option a, and otherwise. 
We can re-write the choice constraints Q as 

v l {Xa — Xb) > 0, if a y b in a choice experiment, 

or shorter as v n a b > 0, where n ab = (xa — Xb)- A vec- 
tor v is called feasible if it satisfies all constraints. The set 
of all feasible vectors is in general a (not necessarily full- 
dimensional) double cone whose apex is the origin. Among 
all the feasible vectors we want to choose one with good gen- 
eralization properties. This can be phrased as a two-class 
classification problem as follows: let H ab be the hyperplane 



{v e R m I v l n ab = 0} 



with normal n a b, and let 



and 



H, 



h: 



{veR m \ vn ab > 0} 



{v e 



v n ab 



<0} 



be the two closed halfspaces bounded by H a 



Note that 



H^ b = H ba . If a was preferred over 6 in a choice experiment, 
then we have a constraint of the form v £ H~Z b , otherwise, 
if 6 was preferred over a in a choice experiment, then we 
have v G H~ b . That is, we can assign a label +, or — , re- 
spectively to the hyperplane H ab depending on the outcome 
of a choice experiment for this hyperplane. Since the label 
attached to the hyperplane H ba is just the opposite of the 
label attached to H a b we can restrict ourselves to one of the 
two hyperplanes for every pair a 7^ b £ A, e.g., by fixing an 
arbitrary order on the elements of A, and only considering 
hyperplanes H ab , where a comes before 6 in this order. That 
is, we are given labelled hyperplanes as input and are look- 
ing for a point in the feasible cone that can be written as the 
intersection of the halfspaces H^ b if a > b in a choice exper- 
iment, and H~ b if b y a in a choice experiment. In standard 
linear two-class classification the situation is the other way 
around: we are given labelled points and are looking for a 



hyperplane that separates the points according to their la- 
bels. There are several geometric duality transform know 
that map hyperplanes into points and vice versa, see for ex- 
ample 9 , which in principle allow to transform our problem 
to compute part-worth values into a standard two-class clas- 
sification problem. The duality transform that we consider 
here maps non-vertical (labeled) hyperplanes to (labeled) 
points and vice versa see Figure 111 for an example in R 2 . 
Since many of the hyperplanes H ab are vertical, i.e., paral- 
lel to the m'th coordinate axis, we augment the hyperplane 
normals with a (m + l)'th coordinate and set the value of 
this coordinate to e > 0. This leads to a two-class classifi- 
cation problem that is parameterized by e. Formulating the 
SVM for this problem and taking the limit e — + leads to 
the following QP: 



CBC minimize!, 
subject to 



f'n-ab > 1, if a y b 

in a choice experiment. 



Soft margin formulation. On real data we have to deal 
with contradictory information, i.e., observed choices of the 
form ay b and b y a, especially when assessing preferences 
on a population, but also individuals can be inconsistent 
in their choices. Also with contradictory information we 
can proceed as before with the only difference that after 
dualizing we work with a soft margin C-SVM (2) to deal 
with the contradictions. This leads to the following pQP: 



CBC(C) minimize,,, |||«|| 2 + G £™i & 
subject to v t n ab > 1 — £7, if 

a y b in the j'th choice 
experiment. 

& > 0, 

with a non-negative slack variable £j for every choice, i.e., 
a constraint v(a) — v(b) + £j > 0, £j > if a was preferred 
over 6 in the j'th choice experiment. The slack is penal- 
ized in the objective function by the term Ylj=i assuming 
that we have information from k choice experiments, and 
C > is the standard trade-off parameter between model 
complexity and quality of fit on the observed data. The 
problem CBC(C) has already been suggested by Evgeniou 



et al. 1 12 to compute part-worth values, but without giving 
details why it is well suited for that task. A similar resulting 
formulation is also known as the ranking SVM when 
the representing features are the present parameter levels in 
each option. 

5. EXPERIMENTAL RESULTS 

To test the criss-cross method we provided an experimental 
proof of concept implementation. The implementation at 
its current stage is not really efficient, but the results for 
the number of iterations needed by the criss-cross method 
along the path are promising. We hope to fully exploit this 
behavior with a state of the art implementation in the near 
future. 

We tested the criss-cross method on a choice based conjoint 
analysis data set that we obtained in a larger user study to 
measure the perceived quality for a visualization task [16] : 
the conjoint study had six parameters with 3,5,6,2,3, and 
5 respectively levels. That is, in total this study comprised 
24 levels for which we estimate the part-worth value. To es- 
timate the part-worth values we had over participants of our 
study had to provide answers in choice experiments. Hence 
the problem CBC(C) leads to problem QP(/i) whose ma- 
trix Q has dimension (24 + s) x (24 + s), and whose matrix A 
has dimension s x (24 + s), when s is the number of choice ex- 
periments. Hence we can essentially control the complexity 
of the problem by the number of choice experiments consid- 
ered. 

For 40 choice experiments exemplary paths need (580, 47, 22) 
or (580, 4, 48) iterations for the criss-cross method for the 
first three bends on a C-interval from [1, 6.0910 13 ]. This 
clearly shows that even if a starting point for the solution 
path may needs some time to be computed (580 steps by 
the criss-cross method, but of course other methods could 
also be used to compute a starting solution) our described 
criss-cross method is very effective to continue the path at 
the bends. 

6. CONCLUSION 

We have presented a generic solution path algorithm for pa- 
rameterized quadratic programs that works for all regular- 
ization methods that result in a single parametric quadratic 
program, also when the kernel matrix is not of full rank. 

Since the state of the art solution methods in machine learn- 
ing are moving away from finding exact solutions to faster 
approximate methods, it would be an interesting further re- 
search topic to investigate paths of approximate solutions 
of parametrized quadratic programs. Also, it should be 
further investigated how multi-parametric programming ap- 
proaches [33] may help to find several parameters simultane- 
ously, such as the regularization parameter, regression tube 
width, and also kernel parameters [37] . 
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